Estimating crowd count in densely crowded scenes is an extremely challengingtask due to non-uniform scale variations. In this paper, we propose a novelend-to-end cascaded network of CNNs to jointly learn crowd count classificationand density map estimation. Classifying crowd count into various groups istantamount to coarsely estimating the total count in the image therebyincorporating a high-level prior into the density estimation network. Thisenables the layers in the network to learn globally relevant discriminativefeatures which aid in estimating highly refined density maps with lower counterror. The joint training is performed in an end-to-end fashion. Extensiveexperiments on highly challenging publicly available datasets show that theproposed method achieves lower count error and better quality density maps ascompared to the recent state-of-the-art methods.
展开▼